Model - based Direct Policy Search ( Extended Abstract ) Jan

نویسندگان

  • Jan Hendrik Metzen
  • Frank Kirchner
چکیده

Scaling Reinforcement Learning (RL) to real-world problems with continuous state and action spaces remains a challenge. This is partly due to the reason that the optimal value function can become quite complex in continuous domains. In this paper, we propose to avoid learning the optimal value function at all but to use direct policy search methods in combination with model-based RL instead.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Integrated Supply Chain of After-sales Services Model: A Multi-objective Scatter Search Optimization Approach

Abstract: In recent decades, high profits of extended warranty have caused that third-party firms consider it as a lucrative after-sales service. However, customers division in terms of risk aversion and effect of offering extended warranty on manufacturers’ basic warranty should be investigated through adjusting such services. Since risk-averse customers welcome extended warranty, while the cu...

متن کامل

Reward-Weighted Regression with Sample Reuse for Direct Policy Search in Reinforcement Learning

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previo...

متن کامل

Neuro-Evolution for Multi-Agent Policy Transfer in RoboCup Keep-Away: (Extended Abstract)

An objective of transfer learning is to improve and speedup learning on target tasks after training on a different, but related source tasks. This research is a study of comparative Neuro-Evolution (NE) methods for transferring evolved multi-agent policies (behaviors) between multi-agent tasks of varying complexity. The efficacy of five variants of two NE methods are compared for multi-agent po...

متن کامل

Policy Search with High-Dimensional Context Variables

Direct contextual policy search methods learn to improve policy parameters and simultaneously generalize these parameters to different context or task variables. However, learning from high-dimensional context variables, such as camera images, is still a prominent problem in many real-world tasks. A naive application of unsupervised dimensionality reduction methods to the context variables, suc...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010